Goto

Collaborating Authors

 manifold structure



Manifold structure in graph embeddings

Neural Information Processing Systems

Statistical analysis of a graph often starts with embedding, the process of representing its nodes as points in space. How to choose the embedding dimension is a nuanced decision in practice, but in theory a notion of true dimension is often available. In spectral embedding, this dimension may be very high. However, this paper shows that existing random graph models, including graphon and other latent position models, predict the data should live near a much lower-dimensional set. One may therefore circumvent the curse of dimensionality by employing methods which exploit hidden manifold structure.


Statistical Inference for Manifold Similarity and Alignability across Noisy High-Dimensional Datasets

Chen, Hongrui, Ma, Rong

arXiv.org Machine Learning

The rapid growth of high-dimensional datasets across various scientific domains has created a pressing need for new statistical methods to compare distributions supported on their underlying structures. Assessing similarity between datasets whose samples lie on low-dimensional manifolds requires robust techniques capable of separating meaningful signal from noise. We propose a principled framework for statistical inference of similarity and alignment between distributions supported on manifolds underlying high-dimensional datasets in the presence of heterogeneous noise. The key idea is to link the low-rank structure of observed data matrices to their underlying manifold geometry. By analyzing the spectrum of the sample covariance under a manifold signal-plus-noise model, we develop a scale-invariant distance measure between datasets based on their principal variance structures. We further introduce a consistent estimator for this distance and a statistical test for manifold alignability, and establish their asymptotic properties using random matrix theory. The proposed framework accommodates heterogeneous noise across datasets and offers an efficient, theoretically grounded approach for comparing high-dimensional datasets with low-dimensional manifold structures. Through extensive simulations and analyses of multi-sample single-cell datasets, we demonstrate that our method achieves superior robustness and statistical power compared with existing approaches.



Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive

Farghly, Tyler, Potaptchik, Peter, Howard, Samuel, Deligiannidis, George, Pidstrigach, Jakiw

arXiv.org Machine Learning

Diffusion models have achieved state-of-the-art performance, demonstrating remarkable generalisation capabilities across diverse domains. However, the mechanisms underpinning these strong capabilities remain only partially understood. A leading conjecture, based on the manifold hypothesis, attributes this success to their ability to adapt to low-dimensional geometric structure within the data. This work provides evidence for this conjecture, focusing on how such phenomena could result from the formulation of the learning problem through score matching. We inspect the role of implicit regularisation by investigating the effect of smoothing minimisers of the empirical score matching objective. Our theoretical and empirical results confirm that smoothing the score function -- or equivalently, smoothing in the log-density domain -- produces smoothing tangential to the data manifold. In addition, we show that the manifold along which the diffusion model generalises can be controlled by choosing an appropriate smoothing.


Generalized Unsupervised Manifold Alignment

Neural Information Processing Systems

In this paper, we propose a generalized Unsupervised Manifold Alignment (GUMA) method to build the connections between different but correlated datasets without any known correspondences. Based on the assumption that datasets of the same theme usually have similar manifold structures, GUMA is formulated into an explicit integer optimization problem considering the structure matching and preserving criteria, as well as the feature comparability of the corresponding points in the mutual embedding space. The main benefits of this model include: (1) simultaneous discovery and alignment of manifold structures; (2) fully unsupervised matching without any pre-specified correspondences; (3) efficient iterative alignment without computations in all permutation cases. Experimental results on dataset matching and real-world applications demonstrate the effectiveness and the practicability of our manifold alignment method.


Supplementary material for Manifold structure in graph

Neural Information Processing Systems

Proof of Theorem 3. We have Z Lemma 4. Consider a polynomial kernel over a bounded region Z R The associated integral operator has finite rank. Lemma 5. Suppose f is analytic on Z with f ( x, y)= Figure 4: Kernel density ridge sets (red) as estimates of the underlying manifold (blue), for embed-dings of simulated graphs described in Section 2 and also shown in Figure 1.


FedMP: Tackling Medical Feature Heterogeneity in Federated Learning from a Manifold Perspective

Zhou, Zhekai, Liu, Shudong, Zhou, Zhaokun, Liu, Yang, Yang, Qiang, Zhu, Yuesheng, Luo, Guibo

arXiv.org Artificial Intelligence

Federated learning (FL) is a decentralized machine learning paradigm in which multiple clients collaboratively train a shared model without sharing their local private data. However, real-world applications of FL frequently encounter challenges arising from the non-identically and independently distributed (non-IID) local datasets across participating clients, which is particularly pronounced in the field of medical imaging, where shifts in image feature distributions significantly hinder the global model's convergence and performance. To address this challenge, we propose FedMP, a novel method designed to enhance FL under non-IID scenarios. FedMP employs stochastic feature manifold completion to enrich the training space of individual client classifiers, and leverages class-prototypes to guide the alignment of feature manifolds across clients within semantically consistent subspaces, facilitating the construction of more distinct decision boundaries. We validate the effectiveness of FedMP on multiple medical imaging datasets, including those with real-world multi-center distributions, as well as on a multi-domain natural image dataset. The experimental results demonstrate that FedMP outperforms existing FL algorithms. Additionally, we analyze the impact of manifold dimensionality, communication efficiency, and privacy implications of feature exposure in our method.


Locality Preserving Markovian Transition for Instance Retrieval

Luo, Jifei, Wu, Wenzheng, Yao, Hantao, Yu, Lu, Xu, Changsheng

arXiv.org Artificial Intelligence

Diffusion-based re-ranking methods are effective in modeling the data manifolds through similarity propagation in affinity graphs. However, positive signals tend to diminish over several steps away from the source, reducing discriminative power beyond local regions. To address this issue, we introduce the Locality Preserving Markovian Transition (LPMT) framework, which employs a long-term thermodynamic transition process with multiple states for accurate manifold distance measurement. The proposed LPMT first integrates diffusion processes across separate graphs using Bidirectional Collaborative Diffusion (BCD) to establish strong similarity relationships. Afterwards, Locality State Embedding (LSE) encodes each instance into a distribution for enhanced local consistency. These distributions are interconnected via the Thermodynamic Markovian Transition (TMT) process, enabling efficient global retrieval while maintaining local effectiveness. Experimental results across diverse tasks confirm the effectiveness of LPMT for instance retrieval.


MPEC: Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers

Shahbazi, Shermin, Nasiri, Mohammad-Reza, Ramezani, Majid

arXiv.org Artificial Intelligence

ORCID: 0000 - 0003 - 0886 - 7023 Abstract -- Accurate classification of EEG signals is crucial for brain - computer interfaces (BCIs) and neuroprosthetic applications, yet many existing methods fail to account for the non - Euclidean, manifold structure of EEG data, resulting in suboptimal performance. Preserving this manifold information is essential to capture the true geometry of EEG signals, but tradition al classification techniques largely overlook this need. To this end, w e propose MPEC (Manifold - Preserved EEG Classification via an Ensemble of Clus tering - Based Classifiers), that introduces two key innovations: (1) a feature engineering phase that combines covariance matrices and Radial Basis Function (RBF) kernels to capture both linear and non - linear relationships among EEG channels, and (2) a clustering phase that employs a modified K - means al gorithm tailored for the Riemannian manifold space, ensuring local geometric sensitivity. Ensembling multiple clustering - based classifiers, MPEC achieves superior results, validated by significant improvements on the BCI Competition IV dataset 2a. Keywords -- brain - computer interfaces (BCIs), EEG signal classification, ensemble modeling, clustering - based classification. EEG signal classification is essential in brain - computer interfaces (BCIs) and neuroprosthetics, where precise interpretation supports real - time control and cognitive applications. However, traditional techniques often overlook the non - Euclidean, manifold structure of EEG data, leading to suboptimal results [1] . We propose Manifold - Preserved EEG Classification via an Ensemble of Clustering - Based Classifiers (MPEC), a novel method that enhances classification accuracy by preserving the intrinsic manifold structure of EEG signals.